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A set of independence statements may define the independence structure of interest in a family 
of joint probability distributions. This structure is often captured by a graph that consists of 
nodes representing the random variables and of edges that couple node pairs. One important class 
contains regression graphs. Regression graphs are a type of so-called chain graph and describe 
stepwise processes, in which at each step single or joint responses are generated given the relevant 
explanatory variables in their past. For joint densities that result after possible marginalising 
or conditioning, we introduce summary graphs. These graphs reflect the independence structure 
implied by the generating process for the reduced set of variables and they preserve the implied 
independences after additional marginalising and conditioning. They can identify generating 
dependences that remain unchanged and alert to possibly severe distortions due to direct and 
indirect confounding. Operators for matrix representations of graphs are used to derive these 
properties of summary graphs and to translate them into special types of paths in graphs. 

Keywords: concentration graph; directed acyclic graph; endogenous variables; graphical 
Markov model; independence graph; multivariate regression chain; partial closure; partial 
inversion; triangular system 

1. Motivation, some previous and some of the new 
results 

1.1. Motivation 

Graphical Markov models are probability distributions denned for ad^xl random vector 
variable Yy whose component variables may be discrete, continuous or of both types and 
whose joint density fy satisfies the independence statements specified directly by an 
associated graph as well as those implied by the graph. The set of all such statements is 
the independence structure captured by the graph. 

One such type of graph was introduced for sequences of regression by Cox and Wer- 
muth (1993, 1996) for which special results have been derived by Drton (2009), Kang 
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and Tian (2009), Marchetti and Lupparelli (2011), Wermuth and Cox (2004), Wermuth, 
Wiedenbeck and Cox (2006), Wermuth, Marchetti and Cox (2009), Wermuth and Sadeghi 
(2011). 

A regression graph consists of nodes, say in set V, that represent random variables, 
and of edges that couple node pairs such that a recursive order of the joint responses is 
reflected in the graph. Associated discrete distributions have some desirable properties 
derived by Drton (2009). Each defining independence constraint respects the given re- 
cursive ordering of the joint responses; sec Marchetti and Lupparelli (2011). This feature 
distinguishes regression graphs from all other currently known types of chain graphs and 
permits one to model data from both interventional and observational studies. 

Because of this property, regression graphs are particularly well suited to the study 
of effects of hypothesized causes on sequences of joint responses; see Cox and Wermuth 
(2004). More generally, they can model developmental processes, such as in panel studies. 
These provide data on a group of individuals, termed the 'panel', collected repeatedly, say 
over years or decades. Often one wants to compare corresponding analyses with results 
in other studies that have core sets of variables in common, but that have omitted some 
of the variables or that were carried out for subpopulations. 

It is an outstanding feature of regression graph models that their implications can be 
derived after marginalising over some variables, say in set M, or after conditioning on 
others, say in set C. In particular, graphs can be obtained for node set TV = V \ {C, M} 
that capture precisely the independence structure implied by a generating graph in node 
set V for the distribution of Yjv given Yq ■ 

Such graphs are called independence-preserving, when they can be used to derive the 
independence structure that would have resulted from the generating graph by condition- 
ing on a larger node set {C, c} or by marginalising over a larger node set {M,m}. Two 
types of such classes are known. One is the subclass of the much larger class of MC graphs 
of Koster (2002), which can be generated by a regression graph in a larger node set. An- 
other class contains the MAG's (maximal ancestral graphs) of Richardson and Spirtcs 
(2002). We speak of two corresponding graphs if they result from a given generating 
graph relative to the same conditioning and marginalising sets. 

A third class of this type is the summary graph of Wermuth, Cox and Pearl (1994). 
This class is presented in the current paper in simplified form together with proofs based 
on operators for binary matrix representations of the graphs. In contrast to a MAG, 
a corresponding summary graph can be used to identify those dependences of a given 
generating process for Yy with V > N that remain undistortcd in the corresponding 
MAG model for Ym given Yq and those that may be severely distorted. This is especially 
helpful at the planning stage of studies when alternative sets M and C are considered 
given a hypothesized generating graph in V > N . Annotated, undirected graphs of Paz 
(2007), for C empty, serve a similar purpose. 

The warning signals for distortions provided by summary graphs are essential for un- 
derstanding consequences of a given data generating process with respect to dependences 
in addition to independences. For this, some special properties of the types of generat- 
ing graph will be introduced as well as specific requirements on the types of generating 
process. These lead to families of distributions that are said to be generated over parent 
graphs. 
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1.2. Some notation and concepts 

Some definitions for graphs are almost self-explanatory. If pair i ^ k of V is coupled by 
a directed edge such that an arrow starts at node k and points to node i, then k is named 
a parent of i and i the offspring of k. For two disjoint subsets a and /? of V, an ifc-arrow, 
iH — fc, is said to point from /? to a if the arrow starts at a node A; in j3 and points 
to a node i in a. Nodes other than the endpoint nodes are the inner nodes of a path; 
only the inner nodes have to be distinct. For three or more nodes, an ifc-path connects 
the path endpoint nodes i and k by a sequence of edges that couple its inner nodes. An 
ifc-path with i = k is a cycle. 

An edge is regarded as a path without inner nodes. Both a graph and a path are called 
directed if all its edges are arrows. If all arrows of a directed ifc-path point towards node i, 
then node k is named an ancestor of i and i a descendant of k. Such a path is also called 
a descendant-ancestor path. 

Directed acyclic graphs form an important subclass of regression graphs. They arise 
from stepwise generating processes of exclusively univariate response variables; see Sec- 
tion 2 below. These graphs have no directed cycles. 

As we shall show, two different types of undirected graph are subclasses of regression 
graphs, named covariancc graphs and concentration graphs. For joint Gaussian distribu- 
tions, they give models for zero constraints on covariances or on concentrations, respec- 
tively; see Wcrmuth and Cox (1998) and (2.5) and (2.16) below. To distinguish between 

them in figures, edges in concentration graphs are shown as full lines, i k, and in 

covariance graphs by dashed lines, i k. 

Separation criteria provide what is called the global Markov property of a graph since 
it gives all independence statements that belong to the graph's independence structure. 

Definition 1. A graph, consisting of a node set and of one or more edge sets, is an 
independence graph if node pairs are coupled by at most one edge and each missing edge 
corresponds to at least one independence statement. 

Regression graphs and MAGs are independence graphs but, in general, summary 
graphs, ancestral graphs and MC graphs are not, even with at most one edge for each 
node pair; see the discussion of Figure 3(b) below. 

The same graph theoretic notion of separation applies to both types of undirected 
graph. Let a and (3 be two non-empty, disjoint subsets of their node set V and let 
{a,/3,m,c} partition V, then we write Y a is conditionally independent of Yp given Y c 
compactly as a _LL f3\c. In a concentration graph, a is separated by c from /? if every 
path from a to /3 has a node in c, while in the covariance graph, a is separated by to 
from /3 if every path from a to /3 has a node in to. Given separation of a and /3 by set c, 
a concentration graph implies a _LL j3\c; see Lauritzcn (1996). Given separation of a and (3 
by set to, a covariance graph implies a _LL fj\c; see Kaucrmann (1996), who expresses the 
result in a different but equivalent way. 

When a graph is directed or contains different types of edge then its separation criterion 
is more complex than the one for undirected graphs. For directed acyclic graphs, there are 



848 



N. Wermuth 



several different separation criteria that permit us to obtain all independence statements 
implied by the graph; see Marchetti and Wermuth (2009) for proofs of equivalence. 

The criterion due to Geiger, Verma and Pearl (1990), has been extended in almost 
unchanged form by Koster (2002) to the much larger class of MC graphs. A path-based 
proof, due to Sadeghi (2009), is for the subclass of MC graphs that is of interest here, 
the MC graphs that can be derived from a larger directed acyclic graph. For summary 
graphs, see Lemma 1 below. 

A list of independence statements associated with the missing edges of an indepen- 
dence graph gives a graph's pairwise Markov property. Whenever it defines the graph's 
independence structure, then the pairwise Markov property is said to be equivalent to 
the global Markov property. 

For all disjoint subsets a,b,c,d of node set V, the following general definitions are 
relevant, respectively, for combining pairwise independences in covariance graph and in 
concentration graph models. 

Definition 2. The composition property is 

a _LL b\d and a _LL c\d imply a _LL bc\d. 

Definition 3. The intersection property is 

a _LL b\cd and a _LL c\bd imply a _LL bc\d. 

Given these properties, the independence structure of interest in a covariance or concen- 
tration graph model can be specified in terms of independence constraints on a set of vari- 
able pairs. For general searching discussions, sec Dawid (1979), Pearl (1988), Lauritzen 
(1996) and Studeny (2005). 

Necessary and sufficient conditions under which discrete and Gaussian distributions 
satisfy the intersection property have been derived by San Martin, Mochart and Rolin 
(2005). They show in particular that of the commonly specified sufficient conditions, some 
may be much too strong - for instance, requiring exclusively positive probabilities for 
discrete distributions. For joint Gaussian distributions, a positive definite joint covariance 
matrix is sufficient. In both cases, no component of the involved random variables is 
degenerate. 

Definition 4- A family of joint distributions is said to vary fully if its random variables 
contain no degenerate components and it satisfies the intersection property. 

Definition 5. In families of joint distributions with the composition property, pairwise 
independent variables are also mutually independent. 

For families of joint distributions with the composition property, in which a regression 
graph with a complete concentration graph captures the independences of interest, the 
global and the pairwise Markov property are equivalent; see also Kang and Tian (2009). 
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For a long time, only the family of Gaussian distributions was known to satisfy both 
the composition and the intersection property provided it varies fully. Under the same 
type of constraint, this is now known to hold for the special family of distributions in 
symmetric binary variables introduced by Wermuth, Marchctti and Cox (2009). More 
important, as we shall see, it holds for families generated over so-called parent graphs. 

The notion of completeness has been introduced and studied in quite different contexts 
[see Lehmann and Scheffe (1955); Brown (1986), Theorem 2.12; and Mandelbaum and 
Riischcndorf 1987]. It means that the joint family of distribution of vector variable Y is 
such that a zero expectation of any function g(y) implies that the function itself is zero 
with probability one, that is, almost surely (a.s.). 

Definition 6. Let f(y) denote the density of a member of a complete family of distri- 
butions and g(y) be some function of Y . Then it holds that 

J g(y)f(y)dy = Q => g(y) = o a.s. 

For any trivariate family of distributions with precisely two associated variable pairs, 
say (Yi, Y2) and (Yi, Yj), but 2 _LL 3, completeness of the joint distribution is sufficient 
to conclude that Y2 is conditionally dependent on Y3 given Y\. This follows from Corol- 
lary 3 of Wermuth and Cox (2004) and properties of completeness. In this situation, the 
generating graph 

2 — ^1^ — 3 

is inducing a 2,3-cdge in the summary graph obtained by conditioning on node 1 and 
a non- vanishing conditional association for Y2, Y3 given Y\. 

In Section 2, we define parent graphs as directed acyclic graphs with special properties 
and corresponding types of stepwise generating processes such that edge-inducing paths 
are also association-inducing. The families of distributions generated over parent graphs 
and the members of the families satisfy the intersection and the composition property in 
addition to the general laws of probability that govern independences in any joint family 
of distribution; for a discussion of the latter see Studeny (2005). 

1.3. Definition and construction of summary graphs 

In contrast to MC graphs and MAGs, regression graphs are not closed under marginalis- 
ing and conditioning, that is, one can get from a given regression graph outside the class of 
regression graphs after marginalising and conditioning as illustrated with Figure 3 below. 
But the graph resulting in this way from any regression graph is always within the class of 
summary graphs. This explains partly why we study the larger class of summary graphs. 

Definition 7. A summary graph, G^ um , has node set N, which consists of disjoint 
subsets u,v, ordered as (u,v). Within u, the graph has a mixture of a directed acyclic 
graph and of a covariance graph and, within v, it has a concentration graph. Between u 
and v, only arrows point from v to u. 
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The notions of parents, offsprings, ancestors and descendants remain unchanged in 
a summary graph compared to a directed acyclic graph. As will be shown, every summary 
graph in node set N can be generated from a directed acyclic graph in node set V = {O} 
by conditioning on C = {p} and marginalising over M = {$} so that N = V \ {C, M}. 

This graph is denoted by G^um ; an associated density by /jyic that results from 
fv j the given density of the generating graph, which factorizes according to this graph; 
see (1.3) below. 

The density Jn\c m &y concern discrete, continuous or mixed variables, as implied 
by fv- It has a factorization according to (u,v) that is written compactly in terms of 
node sets as 

In\C = fu\vcfv\c- (!•!) 

In the larger generating graph in node set V, every node in v and no node in u is an 
ancestor of the conditioning set C . Thus, each component of Y v has been generated befo- 
re Y u ; see Figure 2 for an example. 

Figures 1-3 illustrate how summary graphs may be generated. For this, the stepwise 
construction of a summary graph by marginalising over m = {t} or conditioning on 
c = {s} in G^ um is given in Table 1. 

If a node t is coupled with both of the nodes i and fc, then t is said to be their common 
neighbor. In two-edge paths, the inner node is named a collision node for 

O — >~0^ — O, O — O, O O O, 

and a transmitting node, otherwise. A path for which all inner nodes are collision nodes 
is a collision path and a path for which all inner nodes are transmitting nodes is a trans- 
mitting path. 

Table 1. Types of induced edge when each of m or c contains a single node 



Types of induced edge when marginalising over the common neighbor node t 

t — t O t-< — O t O 

— t O O — O — O O O 

O 1 ■ O O O O O — 



and types of induced edge when conditioning on the common neighbor node s or on 
one of the descendants of s 





s^— O 


S---0 


O— ^ s 


O O 


O — 


Q---S 




O o 



where the ■ notation indicates a symmetric entry. 
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Figure 1. (a) A summary graph with node 4 to be marginalised over and node 5 to be con- 
ditioned on, (b) the graph of (a) including edges induced for conditioning on node 5, (c) the 
graph of (a) including edges induced for marginalising over node 4, (d) G^m J ' 4 ' ■ 
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Figure 2. (a) A directed acyclic graph generating (b) a summary graph without semi-directed 
cycles; u = {1,2,3,4} and v= {5,6,7,8}. 
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Figure 3. (a) A directed acyclic graph generating (b) a summary graph with v as the empty 
set and several semi-directed cycles; the 4, 4-path with inner nodes 1, 2, 3, the 6, 6-path via inner 
node 5 and the double edge for (6, 7). 



Table 1 is taken from Wermuth, Cox and Pearl (1994). In the Appendix here, we show 
that the types of edge are self-consistent when they are induced using Table 1. The table 
implies in particular that a collision node is edge-inducing by conditioning on it while 
a transmitting node is edge- inducing by marginalising over it. 

Let a summary graph, G^ um , be given and nodes s ^ t of N be selected. Suppose 
one intends to marginalise over node t and to condition on node s and d s denotes the 
ancestors of s within u of G^ um . Then, a new summary graph in node set N' = N \ {s, t} 
results by using the procedure given in the following Proposition 1. The graph G^ m has 
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its concentration graph in v' = v \ {s, t} whenever both nodes are in v, in v' = v \ {s} for 
only s in v, in v' = {v \ {t}, d s } for only t in v and in v' = {v, d s } for both nodes in u. 

Proposition 1 (Generating a summary graph from G^ m by operating on at 
most two nodes). From G^ im , the independence-preserving summary graph G^m S '^ is 
generated, with t the marginalising node and s the conditioning node, by inducing edges 
as prescribed in Table 1: 

(1) first for the neighbors oft, second for the neighbors of s and of all of its ancestors, 
ignoring in the second step edges involving t, 

(2) changing each edge present within v' into a full line and each edge present between 
u' and v 1 into an arrow pointing from v' to u' , 

(3) keeping for each node pair of several edges that are of the same kind just one and 
deleting all nodes and edges involving s or t. 

Section 3 contains proofs in terms of operators for matrix representations of graphs. 
The proofs imply for any node subset {m,c} of TV that G^m 0,m ' may be derived before 
conditioning on set c, or G^Cm' ^ before marginalising over set m and that within sets c 
or m any order of the nodes can be chosen. In particular, in step (1) one may first work 
on the neighbors of s and of all of its ancestors and second on the neighbors of t, ignoring 
edges involving s in this second step. 

The matrix formulations lead more directly to G^Cm' m \ but Proposition 1 gives an 
algorithm for operating on one node at a time. It is also helpful for small graphs, as illus- 
trated with Figures 1-3. Proposition 1 implies that no coupled pair ever gets uncoupled 
and that the two types of path that may occur when constructing a summary graph arc 
replaced in G^m J '*' : 

O — O by O O O, 

O o O by CM — O O. 

The starting summary graph of Figure 1 is in 1(a). For j = 5 and t = 4, Figure 1(b) 
shows the edges induced by operating first on j, Figure 1(c) those induced by operating 
first on t and Figure 1(d) displays G^m • 

By construction, a summary graph contains no directed cycle, but possibly semi- 
directed cycles. These are direction-preserving cycles containing at least one undirected 
edge; see, for instance, nodes 1,2,3,4 of Figure 3(b). 

Corollary 1 (Regression graphs and summary graphs). A regression graph is 
a summary graph without semi-directed cycles. 

In contrast to a summary graph, a regression graph is an independence graph that has 
at most one edge coupling any node pair; compare Figures 2(b) and 3(b). Figure 2(b) 
shows a regression graph generated from a directed acyclic graph and Figure 3(b) a sum- 
mary graph with semi-directed cycles. 
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By replacing each dashed z/c-edge by an ifc-path i~i — $ — >-k, every summary graph 
has a virtual generating directed acyclic graph for the nodes within u even though 
a dashed line might actually have been generated by over-conditioning, that is, by 
including an offspring in the conditioning set of two of its parents; see, for example, 
$ — S^IqM — ff as the inner nodes of the 6, 7 path in Figure 3(a). 

Similarly, cycles in four or more nodes within v may be generated from a larger 
directed acyclic graph by including additional nodes, 0, in appropriate ways; sec 
Cox and Wcrmuth (2000). The summary graph in node set N is uniquely defined if 
generated from a directed acyclic graph in node set V for given sets M, G, but typically 
many different directed acyclic graphs, in node sets larger than N, may lead to the same 
summary graph. 

1.4. Independence interpretation of summary graphs 

A criterion to decide whether a given summary graph, GYu^' M \ implies a _LL (3\cC is 
given next. For this, the node set N is partitioned as A = {a,/3,c, m}, where only sub- 
sets c or m may be empty. 

Lemma 1 (Path criterion for the global Markov property [Koster (2002), 
Sadeghi (2009)]). The graph GY^' M] implies a _LL /3|cG if and only if it has no ik- 
path between a and {3 such that every inner collision node is in c or has a descendant in 
c and every other inner node is outside c. 

In addition to the directly described path, Lemma 1 specifies many special types of 
forbidden path. We name a path of n > 2 nodes an a-line path if all inner nodes are 
within set a. The marginalising set for a AL /3\cC in G^m C '' M ' is implicitly defined by 
m = N\ {a, /?, c}. Then, in G^^P'*^ , there should be for node i in a and node k in j3 no 
ifc-edge, no m-line transmitting ifc-path, no c-line collision ifc-path and no ifc-path with 
all inner transmitting nodes in m and all inner collision nodes in c. 

Corollary 2 (Active ifc-paths). An ik-path in G^ um is active relative to [c,m] if and 
only if it is an ik-edge or every inner transmitting node is in m and every inner collision 
node is in c or has a descendant in c. 

If an active ifc-path relative to [c, m] has uncoupled endpoints, the path is closed by an 
«fc-edge in G^m C '™' ■ If an active ifc-path has coupled endpoints, the path is edge-inducing 
in the construction process of G^m C ' m '- Thus, we sometimes replace 'active' by the more 
concrete term 'edge-inducing'. 

Figure 2(b) represents a regression graph, hence each missing edge corresponds to 
at least one independence statement. This contrasts with Figure 3(b), which has semi- 
directed cycles and no independence statement is implied for pairs (1,5), (5,7), (5,8), 
(6,8). For pair (1,5), we give more detailed arguments. 
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Figure 4. Important special cases of summary graphs. The two pairs X, Y and Z, U are con- 
strained given Y C \ with X _U_ Y\ZU in (a)-(c), with X _U_ Y \U in (d), (e) and with X _U_ Y in (f); 
with Z _U_ U in (c), (e), (f), with Z _U_ U\Y in (b), (d) and with Z _U_ U\XY in (a). 

In the graph of Figure 2(b), node 3 has no descendants and is an inner collision node 
in every path connecting 1 and 5. Hence, when node 3 is marginalised over, 1 _LL 5|C is 
implied. In the graph of Figure 3(b), pair (1,5), is connected by a descendant-ancestor 
path with inner nodes in {2,3,4}. Therefore, a 1, 5-edge is induced by marginalising over 
nodes 2,3,4 and hence 1 _LL 5|C is not implied. A 1, 5-edge is induced by conditioning on 
node 4 or on any of its descendants in {1, 2, 3} so that 1 _LL 5\cC is not implied, c ^ 0. 

Figure 4 shows special cases of summary graphs, noting that C and one of u, v may 
be empty sets. Figure 4 shows that summary graphs cover all six possible combinations 
of independence constraints on two non-overlapping pairs of four variables X, Z, U, Y. 
Substantive research examples with well-fitting data to linear models of Figure 4 have 
been given by Cox and Wermuth (1993) to the concentration graph in Figure 4(a), the 
directed acyclic graph in Figure 4(b), the graph of seemingly unrelated regression graph 
in Figure 4(d) and the covariance graph in Figure 4(f). 

1.5. Markov equivalence 

The notion of Markov equivalence is important, because for any given set of data one 
cannot distinguish between two Markov equivalent graph models on the basis of goodness- 
of-fit tests. 

Definition 8. Two different graphs in node set N are Markov equivalent if they capture 
the same independence structure. 

Since a different set of two independence statements is associated with each of the 
graphs in Figure 4, none of the six graphs are Markov equivalent. 

Known conditions, under which a concentration graph or a covariance graph is Markov 
equivalent to a directed acyclic graph, may be proven by orienting the graphs, that is, by 
changing each edge present into an arrow. The same type of argument can be extended 
to other independence graphs such as to regression graphs; see also Proposition 2 below. 
For this, we need a few more definitions for graphs. 

For a C N, the subgraph induced by a is obtained by keeping all nodes in a and all 
edges coupling nodes within a. A subgraph induced by three nodes that has two edges is 
named a V-configuration or simply a V. A path is said to be chordless if each inner node 
forms a V with its two neighbors. 
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For V's of a regression graph that are collision paths with endpoints i and k, the inner 
node is excluded from the conditioning set of every independence statement for Yi , Yj~ 
implied by the graph. In contrast, for V's of a regression graph that are transmitting 
paths, the inner node is included in the conditioning set of every independence statement 
for Yi,Yk implied by the graph. Thus, the independence structure of the graph would be 
changed whenever any collision V were exchanged by a transmitting V. 

A concentration graph with a chordless 4-cycle, as in Figure 4(a), or with any larger 
chordless cycle, is not Markov equivalent to a directed acyclic graph; see Dirac (1961) 
and Lauritzen (1996). The reason is that it is impossible to orient the graph, that is, to 
replace each edge by an arrow, without obtaining cither a directed cycle or at least one 
collision V. 

Similarly, a covariance graph is not Markov equivalent to a directed acyclic graph 
if it contains a chordless collision path in four nodes; see Pearl and Wermuth (1994). 
The reason is that it is impossible to orient each edge without obtaining at least one 
transmitting V. There are the following three types of chordless collision paths in four 
nodes in a regression graph: 

o — >-o — oh — o, o — o — oh — o, o — o — o — o. 

The next result in Proposition 2 explains why, in general, three types of edge arc needed 
after marginalising and conditioning in a directed acyclic graph. 

Proposition 2 (Lack of Markov equivalence). If a regression graph contains 
a chordless collision path in four distinct nodes or a chordless cycle in n > 4 nodes 
within v, then it is not Markov equivalent to any directed acyclic graph in the same node 
set. 

Proof. It is impossible to orient the graph with any one of the above chordless collision 
paths in four nodes into edges of a directed acyclic graph without switching between the 
two types of inner nodes in at least one V, that is, between a collision and a transmitting 
node. And, for the chordless cycle in n > 4 nodes, the above result for concentration 
graphs due to Dirac applies. □ 

Currently, one knows how to generate three types of independence-preserving graphs 
from a given directed acyclic graph in node set V for the same disjoint subsets M and 
C of V. In an MC graph, four types of edge may occur in combination, iH — fc, i — ^fc, 

i k and i k. A summary graph may have only one type of double edge, i->--k 

and three types of single edges, i-4 — k, i k and i k, while the maximal ancestral 

graph is an independence graph with up to three types of single edges, i-4 — k, i k 

and i k, where, traditionally, the edge i k is drawn as a double-headed arrow. For 

proofs of Markov equivalence of the three corresponding types of graphs, sec Sadcghi 
(2009). In Section 3.6 below, the unique MAG corresponding to a given summary graph 
is constructed. 
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1.6. Families of distribution generated over parent graphs 

A distribution and its joint density fy is said to be generated over a directed acyclic graph 
whenever fy factorizes recursively into univariate conditional densities that satisfy the 
independence constraints specified with the graph. Any full ordering of V is compatible 
with a given directed acyclic graph if, for each node i, all ancestors of i are in {i + 
1, . . . , dy}. The set of parent nodes of i is denoted by par 4 . 

For V = (1, . . . , dy) specifying a compatible ordering of node set V, a defining list of 
constraints for a directed acyclic graph is 

fi\i+i,...,d v = fi\paxt i JL{i + l,...,dv}\paxJpar i (1.2) 

and the factorization of the density generated over the graph is 

d y 

yV = ]I/i|par 4 . (1-3) 
t=l 

To generate fy recursively, one can take any compatible ordering of V. 

Definition 9. For a recursive generating process of fy, one starts with the marginal 
density fd v of Yd v , proceeds with the conditional density ofYd v -i given Yd v , continues 
to fi\i-\-\,...,d v an d en ds with the conditional density ofY\ given Y2, . . . ,Yd v . 

To let a directed acyclic graph represent one of such recursive generating processes, 
the graph is to capture both independences and dependences. 

Definition 10. A directed acyclic graph, with a given compatible ordering ofV, is edge- 
minimal for fy generated over it if 

fi\ pai- 7^ fi\ par, \i for each I <E par, . 

Under this condition of edge- minimality of the generating graph for fy, all relevant 
explanatory variables are included for each Yi and no edge can be removed from the 
graph without changing the independence statements satisfied by Yi given its past, pst, ; = 

{i + l,...,dy}. 

An edge-minimal graph may represent a research hypothesis in a given substantive 
context. For such a hypothesis, those dependences are considered that are strong enough 
to be of substantive interest while others are translated into independence statements; 
see Wermuth and Lauritzen (1990). 

Definition 11. A recursive generating process of fy in the order V = (1, . . . , dy) is said 
to consist of freely chosen components Yi if each Yi can be discrete or continuous and the 
parameters o//ji pst . are variation independent of those of f pst .. The form of the family 
of distribution of Yi given Y" ps t. may be of any type. 
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For exponential families of distributions, variation-independent factorizations of 
/i.pst; = /jipst Jpst; coincide with the notion of a cut given by Barndorff-Nielsen (1978), 
page 50. These types of factorization imply that the overall likelihood function can be 
maximized by maximizing each factor /j| ps t. separately. 

In families of distribution with fy consisting of freely chosen components that satisfy 
the defining independences (1.2) of the given graph, some further constraints on each 
/i|par are possible such as no-higher-order interactions or such as requiring K; to have 
dependences of equal strength on several of its explanatory variables, that is, on several 
components of Yp ar .. Excluded are, for instance, constraints across conditional distribu- 
tions, such as dependences of Yi on some of Y pal . to be equal to those of Yj, on some of 
Y 

Freely chosen components Yi are in general incompatible with distributions that are to 
be faithful to a generating directed acyclic graph. The notion was introduced by Spirtes, 
Glymour and Scheines (1993). It means that the independence structure of fy coincides 
with the independence structure captured by the graph and it leads in general to complex 
constraints on the parameter space for distributions generated over parent graphs; see 
Figure 1 of Wermuth, Marchetti and Cox (2009) for a simple example with three binary 
variables. In contrast, variation independence permits special constellations of parameter 
values that may lead to independences in fy that arc additional to those implied by the 
graph. 

For research hypotheses, defined in terms of recursive constraints on the independence 
structure and on dependences of fy, appropriate specifications and resulting properties 
can now be given. For this, only connected graphs are considered, those with each node 
pair connected by at least one path. 

Definition 12. A connected, directed, acyclic graph is named a parent graph, Gp ar when 
one ordering of its node set V = (1, . . . ,dy) is given for the recursive generating process 
of fv and it is edge-minimal for fy. 

Definition 13. A family of distributions is said to be generated over a given parent graph 
if it varies fully and each component of fv is freely chosen in the recursive generating 
process of fy . 

Proposition 3 (General properties of families of distribution generated over 
Gp ar ). A family of distributions generated over G^ al . and each of its members satisfies 
the intersection and the composition property. Every ik-path present in G^ ar that induces 
an ik-edge by marginalising or conditioning is also association-inducing for Yi,Y^. 

Proof. The intersection property holds by the definition of fully varying distributions. 
The composition property holds by the definition of a parent graph since pairwise in- 
dependences without mutual independence cannot result for edge-minimal, connected 
graphs that are directed and acyclic. More precisely, let i <k, and c, d be disjoint subsets 
of pst 4 \fe, then both of i _LL c\d and k _LL c\d can be in the defining list of independences 
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only if the statement i _LL c\kd is also satisfied. In this case, fik c \d = fi\kdfk\d = fik\d so 
that ifc il c\d is implied. Finally, edge- minimality of a connected G^ ar and freely chosen 
densities fi\ ps t. assure that each edge-inducing path is also association-inducing. □ 

Excluded are incomplete families of distributions in which the independence statement 
associated with each V is not unique. For instance, for an uncoupled node pair i, k with 
transition V, — j-4 — k and 7 C pst fc , it is impossible that 

Jh h MM 3 =f lh f kh , or cquivalently J (f i{j , - f ih )f jlkj d yj = 0. 



1.7. Using summary graphs to detect distortions of generating 
dependences 

In a MAG, the dependence corresponding to an i/c-arrow may differ, without any warning, 
qualitatively from the generating dependence of Yi on in fy. In particular, it may 
change the sign but stay a strong dependence. If this remained undetected, one would 
come to qualitatively wrong conclusions when interpreting the parameters measuring the 
conditional dependence of Yi on Yj, in f u i v c- 

The summary graph corresponding to a MAG detects, whether and for which of the 
generating dependences, i-4 — k, having both of i,k within u, such distortions can occur 
due to direct or indirect confounding; see Wermuth and Cox (2008) and Corollary 4, 
Lemma 1 below. We illustrate here direct confounding with Figure 5 and indirect con- 
founding with Figure 6. 

For a joint Gaussian distribution, the distortions are compactly described in terms of 
regression coefficients for variables Yj standardized to have mean zero and variance one. 
For Figure 5(a), the generating equations are 

Y 1 =aY 2 + 5Y 4 + e 1 , Y 2 = XY 3 + 7Y4 + e 2 , Y 3 = s 3 , r 4 = e 4 . (1.4) 

With residuals having zero means and being uncorrelated, the equations of the sum- 
mary graph model that result from (1.4) for Y4 unobserved have one pair of correlated 




Figure 5. (a) Generating graph for Gaussian relations in standardized variables, leading for 
variable Y4 unobserved to (b) the summary graph and (c) the maximal ancestral graph for 
the observed variables; with the generating dependences attached to the arrows in (a), simple 
correlations pi2 = a + jS, pi3 — qA, P23 = A and 6 = jS/(l — A 2 ) are implied. 
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9 
c) 

Figure 6. (a) Generating graph for linear relations in standardized variables, leading for vari- 
able Y$ unobserved to (b) the summary graph and (c) the maximal ancestral graph for the 
observed variables; with the generating dependences attached to the arrows in (a) implied are: 
6 — 7^/(1 — t 2 ); generating dependence A undistorted in both models to the graphs (b), (c); 
generating dependence a preserved with (b), distorted with (c). 

residuals 

Yi=aY 2 +r/ 1 , Y 2 = XY 3 + rj 2 , Y 3 = rj 3 , 
r)i=SY A + Ex, r}2 = jY i + e 2 , ?73=£3, 0^(771,772) = 75. 

The equation parameters of the standardized Gaussian associated with the MAG of 
Figure 5(c) are instead defined via 

E(Y 1 \Y 2 = y 2 ,Y 3 = y 3 ), E{Y 2 \Y 3 =y 3 ), 

with all residuals in the recursive equations being uncorrelated. The generating depen- 
dence a is retained in the summary graph model. 

The parameter for the dependence of Y\ on Y 2 in the MAG model, expressed in terms 
of the generating parameters of Figure 5(a), is a + 75/(1 — A 2 ). The summary graph in 
Figure 5(b) is a graphic representation of the simplest type of an instrumental variable 
model, used in econometrics [see Sargan (1958)] to separate a direct confounding effect, 
here 75, from the dependence of interest, here a. 

In general, possible distortions due to direct confounding in parameters of dependence 
in MAG models are recognized in the corresponding summary graph by a double edge 
z J__ k. In the following example of Gaussian standardized variables, there is no direct 
confounding of the generating dependence a but there is indirect confounding of a while 
A remains undistorted. 

To simplify the figures, the coefficient attached to 2h — 3 is not displayed in any of 
the three graphs of Figure 6. The generating graph in Figure 6(a) is directed and acyclic 
so that the corresponding linear equations in standardized Gaussian variables, defined 
implicitly by Figure 6(a), have uncorrelated residuals. The example is adapted from 
Robins and Wasserman (1997). The summary graph in Figure 6(b) shows with a dashed 
line the induced association for pair Y\ , Y 3 that results by marginalising fy over Y5 . 

The equations of the summary graph model, obtained for Y$ unobserved, have precisely 
one pair of correlated residuals, cov(77i, 773) =75 and 

Yi = XY 2 + aYi + 771 , Y 2 = p 23 Y 3 + 772 , Y 3 = tY 4 + 773 , Y 4 = 7] 4 . 

The summary graph model preserves both A and a as equation parameters. 
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In the corresponding MAG model, represented by the graph in Figure 6(c), the equa- 
tion parameters associated with arrows present in the graph are unconstrained linear 
least-squares regression coefficients. These coefficients, expressed in terms of the gener- 
ating parameters of Figure 6(a), are shown next to the arrows in Figure 6(c). Thus, the 
generating coefficient A is preserved, while a is changed into a — t6, with 6 = jS/(l — 
r 2 )- 

Direct confounding of a generating dependence of Yi on Yk is avoided in intervention 
studies, such as experiments and controlled clinical trials, by randomized allocation of 
individuals to the levels of Yfc, but severe indirect confounding may occur nevertheless; 
see Wermuth and Cox (2008). 

Then, the set of ancestors of node i in G^ ar be denoted by anc^. Then, the set of 

ancestors of node i in GY\^F' M ^ within u is c% — u n ancj since no additional ancestor of 
i is ever generated within u. Then, by conditioning Yi on Y v and Y Ci , one marginalises 
implicitly over the nodes in set rrii = {{1, . . . , i}, {u fl pst, \ci}} and indirect confounding 
may result. 

Corollary 3 (Lack of confounding in measures of conditional dependence). 

A generating dependence i-4 — k present in Gp ar is undistorted in the MAG model in 
nodes V\{C, M}: (1) by direct confounding if in Gjf ar there is no active ik-path relative 

to {G, M} and (2) by indirect confounding if in G^m^'*^ there is no active ik-path 
relative to {ci,TOj}. 

In distributions generated over G^ ar , every active path is association- inducing, hence 
a generating dependence will be confounded unless the distortion is cancelled by other 
edge-inducing paths. When a distortion is judged to be severe depends on the subject 
matter context. To detect indirect confounding, we name k a forefather of i if it is an 
ancestor but not a parent of i and three dots indicate more edges and nodes of the same 
type. 

Lemma 2 (A graphical criterion [Wermuth and Cox (2008)]). For i-4 — k of 

Gp m ., indirect confounding in the absence of direct confounding is generated in the MAG 
model by marginalising over M = {I > k, I + 1, . . . , dy} if and only if in the corresponding 
summary graph GY\^' M \ which is without double edges, associations for Yi^Y^ do not 
cancel that result by conditioning on all ancestors of node i, that is, from the following 
types of collision ik-paths that have as inner nodes only forefathers of node i: 

i Ea---ga S k, i S ■ ■ ■ — k. (1.5) 

An example of such a path of indirect confounding is given with Figure 6(b) above, 
where for lH — 4, it is the path 1 3h — 4. 

In the following two sections, we give further preliminary results and those proofs of 
new results for which we use more technical arguments. 
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2. Further preliminary results 

The edge matrix A of a parent graph is a dy x unit upper-triangular matrix, that is, 
a matrix with ones along the diagonal and zeros in the lower triangular part, such that 
for i<k, element Ait of A satisfies 

Aik = 1 if and only if i~i — k in G^ ar . (2-1) 

Because of the triangular form of the edge matrix A of G^ ar , a density fy generated over 
a given parent graph has also been called a triangular system of densities. 

2.1. Linear triangular systems 

A linear triangular system is given by a set of recursive linear equations for a mean- 
centered random vector variable Y of dimension dy x 1 having cov(Y) = S, that is, 

by 

AY = s, (2.2) 
where A is a real- valued dy x dy unit upper-triangular matrix, given by 

Etin(Yi\Y i+1 = y i+1 , . . -,Y dv = y dv ) = -A ? , par .y pari , 

and -Eim(-) denotes a linear predictor. The random vector e of residuals has zero mean 
and cov(e) = A, a diagonal matrix. A Gaussian triangular system of densities is generated 
if the distribution of each residual Ei is Gaussian and the corresponding joint Gaussian 
family varies fully if Au > for all i. 

The covariance and concentration matrix of Y are, respectively, using (A~ 1 ) T = A~ T 

Z = A- 1 AA- T , E- l = A T A- 1 A. (2.3) 

Linear independences that constrain the equation (2.2) are defined by zeros in the trian- 
gular decomposition, (A, A -1 ), of the concentration matrix. For joint Gaussian distribu- 
tions 

Aik = <^=> i -LL fc| par,; for fc g pstj \ par, . 

The edge matrix A of Gp ar coincides for Gaussian triangular systems generated over 
Gp ar with the indicator matrix of zeros in A, that is, A = InL4], where In[-] changes every 
non-zero entry of a matrix into a one. Furthermore, since the parent graph in node set V 
is edge- minimal for fy, we have 

Aik =0 Aik = 0. 

Edge matrices expressed in terms of components of a set of given generating edge 
matrices are called induced. Simple examples of edge matrices induced by A of (2.1) are 
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the overall covariance and the overall concentration graph; see Wermuth and Cox (2004). 
These two types of graphs have as induced edge matrices, respectively, 

S vv =ln[A~ A~ T ] and S vv = ln[A T A], (2.4) 

where A~ has all ones of A and an additional one in position (i,k) if and only if k is 
a forefather of node i in G v ar . In the graph with edge matrix A~ , every forefather k of i 
is turned into a parent, that is, i-4 — k is inserted. 

By writing the two matrix products in (2.4) explicitly, one sees that for an uncoupled 
node pair i, k in the parent graph, there is an additional edge in the induced concentration 
graph of Yy if and only if the pair has a common offspring in G v al . With a zero in 
position i, k of A~ , there is an additional ifc-edge in the induced covariance graph if 
and only if an uncoupled pair has a common parent in the directed graph with edge 
matrix A~ . 

Both of these induced matrices are symmetric. The covariance and the concentration 
matrix, implied by a linear triangular system and given in (2.3), contain all zeros present 
in the corresponding induced edge matrices, but possibly more. This happens for (i, k) 
whenever the associations that are induced for l^,Yfc by several edge- inducing ik- paths 
cancel precisely. For such particular parametric constellations in Gaussian distributions 
generated over parent graphs, see Wermuth and Cox (1998). In data analyses, near can- 
cellations are encountered frequently. 

By contrast, the induced edge matrices capture consequences of the generating inde- 
pendence structure. They contain structural zeros. These are zeros that occur for all 
permissible parametrisations, or, expressed differently, that occur for each member of 
a family fy generated over a given G v ar . 

For distributions generated over parent graphs, a zero in position (i, k) of Syy and of 
S vv means, respectively, that 

iJLk, iALk\V\{i,k} (2.5) 

is implied by Gp ar . Thus, in contrast to the global Markov property, the induced graphs 
answer all queries concerning sets of these two types of independence statements at once. 

More complex induced edge matrices arise, for instance, in regression graphs and 
in summary graphs derived from A. For transformations of linear systems, we use 
the operator called partial inversion, which is introduced next; for proofs and dis- 
cussions see Wermuth, Wiedenbeck and Cox (2006), Marchetti and Wermuth (2009), 
Wiedenbeck and Wermuth (2010). 

2.2. Partial inversion 

Let F be a square matrix of dimension dy with principal submatrices that are all in- 
vcrtible. This holds for every A of (2.2) and for every covariance matrix of a Gaussian 
distribution that varies fully, so that Y has no degenerate component. 
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For any subset o of V and b = V \ a, by applying partial inversion to the linear equations 
FY = 77, say, these are modified into 

™ a F(l a ) = (l a ). (2.6) 



Yb J \Vb 

By applying partial inversion to b of V in equation (2.6), one obtains Y = F~ lr q. Thus, 
full inversion is decomposed into two steps of partial inversion. 

Partial inversion extends the sweep operator for symmetric, invcrtible matrices to non- 
symmetric matrices F 

inv a F = ( F ™ x ~ F °;* Fab ) with F bb . a = F bb - F ba F^F ab . (2.7) 

\^b a r aa £ bb . a J 

Lemma 3 (Some properties of partial inversion [Wermuth, Wiedenbeck and 
Cox 2006]). Partial inversion is commutative, can be undone and is exchangeable with 
selecting a submatrix. For V partitioned as V = {a, 6, c, d}: 

(1) inv Q inv;, F = invf,inv a F, 

(2) inv ah inv bc F = inv QC F, 

(3) [inv Q F] Jt j = inv Q Fjj for J = {a, b}. 



In contrast, the sweep operator cannot be undone; see Dempster (1972). Example 1 
shows how the triangular equations in (2.2) are modified by partial inversion on a, 
where a consists of the first d a components of Y . Instead of the full recursive order 
V = (l,...,aV) with uncorrected residuals, a block-recursive order V = (a, 6) results, 
where residuals within a are correlated, but uncorrelated with the unchanged residuals 
within b. 



Example 1 (Partial inversion applied to a linear triangular system (2.2) with 
an order-respecting split of V). For a = {1, . . . , d a }, b = {d a + 1, • ■ • , dy } 

inv Q A = ( A ™ ~ A £f ab ) gives with Y a = -A^A ab Y b + A^e a , 

the implied form of linear least-squares regression of Y a on Y b , where 

E} in (Y a \Y b =y b ) =n o |i,j/ 6 , Y alb = Y a -U alb Y b , cov(Y a \ b ) = E aa | b 

and 

n a |b = —A a aA ab , ^aa\b = ^aa ^aa-^-aa ■ 



Example 2 shows how the triangular equations contained in (2.2) are modified by 
partial inversion on 6, where V = (a, 6, c) so that b consists of intermediate components 
of Y. To use the matrix formulation in (2.7) directly, one sets b := (a, c), a:=b and leaves 
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components within a and b unchanged to obtain A, which is not block-triangular in (a, b). 
After partial inversion of A on a, the original order is restored for the results presented 
in Example 2. 

Example 2 (Partial inversion applied to a linear triangular system (2.2) for 
an order-respecting partitioning V — (a,b,c)). With a = {1, . . . , d a }, b = {d a + 
1, . . ., (d a + d b )} and c= {(d a + d b ) + 1, . . . ,dy}, 

(A a a A ab A bb A ac . b \ 

A"/ -^A&c gives y = -A- x A 0C . 6 y c + r/ 0) 

A cc / 

the implied form of the linear least-squares regression of Y a on Y c , with 

Va = A aa E a + H a \i,. c A bb £{,, II a |bc = (II a |b.c7 IIa|c.b) = ~A aa (A a b, A ac ). 

For II a | c , a special form of Cochran's recursive definition of regression coefficients results, 
see also Wermuth and Cox (2004), 

Ilajc = IIa|c.b + n a |(,. c ni,| c = — A aa (A ac — A ab A bb A bc ) = — A aa A ac jj. 

For cov(y a | c ), Anderson's recursive definition of covariance matrices results: 

^aa\c = A a( ^A aa A a J + H a \ b c (A bb AbbA^)!!^ c = T, aa \ bc + T, ab \ c T, bb ^ c T, ba \ c . 

For &, c, the result in Example 2 is as in Example 1. For Y a , the original recursive 
regressions given Y b , Y c are modified into recursive regressions given only Y c . The residuals 
between Y ai Y b are correlated since cov(y a i c ,y,i c ) = S a (,| c but remain uncorrelated from 
those in c. In the modified equations, Y b can be removed without affecting any of the 
other remaining relations. 

For a more detailed discussion of the three different types of recursion relations of 
linear association measures due to Cochran, Anderson and Dempster, see Wiedenbcck 
and Wermuth 2010. 

For Example 3, one starts with equation (2.2) prcmultiplied by A T A -1 and obtains 
linear equations in which the equation parameter matrix, coincides with the covari- 
ance matrix of the residuals, that is, one starts with 

S^F = ,4 T A- 1 e. (2.8) 

Example 3 (Partial inversion with any split of V applied to H -1 ). The covari- 
ance matrix E and the concentration matrix E _1 of Y are written, partitioned according 
to (a, b) for a any subset of V, as 
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where the • notation indicates symmetric entries. Partial inversion of E _1 on a leads to 
three distinct components, n a | 6 , the population coefficient matrix of Yj, in linear least- 
squares regression of Y a on Yf,; the covariance matrix E aa |b of Y a \b', and the marginal 
concentration matrix E bb - a of Y"t> 

mv « £ =( J Ybb.aU (2-9) 

where the ~ notation denotes entries that are symmetric except for the sign. 

Since (2.6) and (2.7) give inv a E _1 = inv&E directly, several well-known dual expres- 
sions for the three submatrices in (2.9) result: 

/(E Cm )~ 1 — (Y. aa )~ 1 T, ab \ _ / E aa — EabS^Efca E a (,E. 6 \ 

^ ~ E bb - E ba (E aa )" 1 E ab y ^ ~ E^ 1 )' 

where the explicit form of E^ 1 = E bb a is Dempster's recursive definition of concentration 
matrices. 



A more complex key result is that, for any block-triangular system of linear equations 
for Y, with equation parameter matrix H and with possibly correlated residuals obtained 
from W = cav(HY), the implied form of inv a E _1 can be expressed in terms of partially 
inverted matrices H and W. 

Linear equations in a mean-centered vector variable Y are block-triangular in two 
ordered blocks (a, 6) with a positive-definite E _1 = iJ T VK _1 iJ if 

HY = rj, with H ba = 0, E{rj) = 0, cov(r/) = W positive-definite. (2.10) 

For K = inv a H and Q = inv;, W, direct computations give 

I^-aaQaa-Kaa -^ab ~t~ KaaQ ab-^-bb\ 

~ H bb Qi,bHbb J 

A simple special case is the triangular linear system (2.2). Example 4 shows how regres- 
sions in blocks (a, b) result from it. 



inv (.ff T W _1 .H') 



Example 4. For (2.10) with H = A of (2.2), W = A diagonal and a = 1, . . .,d a , 

m Va (H T A- 1 H)=(^ aalb U a\b\ = f KaaAaaKL \ 
V ~ S b6 / V ~ A bb A bb A bbJ 



Other special cases of linear block-triangular systems (2.10) are Gaussian summary 
graph models; see Section 3. 
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2.3. Partial closure 

Let J 7 be a binary edge matrix for node set V = {1, . . . ,dy} associated with F. The 
operator called partial closure transforms T into zer a T so that in the corresponding 
graph a- line paths of a special type become closed. For instance, applied to A, every 
a-line ancestor of node i is turned into a parent of i and, applied to the edge matrix of 
an undirected graph, such as S vv , every a-line path is closed. Zeros in the new binary 
matrix zer Q T are the structural zeros that remain of inv a F. 

In matrix form, with n — 1 = d a and T aa a d a X d a identity matrix, 



The inverse in (2.13) assures non-negative entries in J-~ a and is a type of regularization; 
see Tikhonov (1963). It generalizes limits of scalar geometric series; see Neumann (1884), 
page 29. 

Lemma 4 (Some properties of partial closure [Wermuth, Wiedenbeck and Cox 
(2006)]). Partial closure is commutative, cannot be undone and is exchangeable with se- 
lecting a submatrix. For V partitioned as V = {a,b,c,d}: 

(1) zcr a zcr b F = zer& zcr a F, 

(2) zer ab zer fcc F = zer abc F, 

(3) [zer a F] Jt j = zer a Fjj for J={a,b}. 

Given Gaussian parameter matrix components after partial inversion, such as in equa- 
tion (2.11), the corresponding induced edge matrices are obtained using Lemma 5, pro- 
vided each component matrix belongs to the model of the starting graph and the ex- 
pressions are minimal, that is, condensed in such a way that they do not contain any 
parameter matrices that cancel, as, for instance, A aa A~^ would. 

Lemma 5 (Edges induced by a starting graph obtained with minimal ma- 
trix expressions of Gaussian parameter matrices [Marchetti and Wermuth 
(2009)]). Edge matrices replace corresponding parameter matrices after: 

(1) changing each negative sign to a positive sign, 

(2) replacing in the resulting expressions each diagonal matrix by an identity matrix 
or deleting it if it arises within a matrix product, and then applying the indicator 
function. 





], (2.12) 



F aa =In{(nl aa - F aa ) x ]. 



(2.13) 



For instance, the matrix formulation of partial inversion in (2.12) can be viewed as 
arising from (2.7) by use of Lemma 5. 
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Example 1 ( Continued). Let K aa = A~ a and K. a b = A~ a A a b- After partial closure in 
Gp al on a, there are two induced edge matrix components. For directed edges, it is zcr a A, 
and for undirected dashed line edges, it is S aa \b 

K\b = In[/C o6 ], 5 aa | b = Jn[Kac££i\- 
The induced graph of two components is a regression graph. 



zer Q A = In 



fcaa ^ab 

Abb 



Example 2 (Continued). By marginalising over the intermediate node set b of V = 
(a,b,c) in G^ ar , a directed acyclic graph results. The induced Gaussian parameter and 
edge matrices are, for N = V \ b, respectively, 

[inv & A] jv,jv = ( A ^ c ' h J > [ zcr & -4] jv,jv = In 

Example 3 (Continued) . A concentration graph has for joint Gaussian distributions 
E" 1 as the parameter matrix and S vv as the edge matrix. By partial closure on a of S vv 
given any split V = {a,b}, every a-line path is closed. Three edge matrix parts result: 
iS ao |6,'P |6 and S bb a . They give the structural zeros in the corresponding parameter 
matrices £ aa i;,,II a |6 and Y, bb - a . In general, the edge matrix S bb ' a is for the marginal 
concentration graph of Yf,. 

When the generating graph is Gp ar , then a concentration graph is induced for the 
node set that contains ancestors of C outside C. In Example 4, the three components of 
inv a Yj VV are directly expressed in terms of the triangular decomposition (A, A -1 ). 



A 



a a 

o 



Ac 



Example 4 (Continued). For the order-respecting split, V = (a,b), and K, aa = A~ a 
and K, a b = A~ a A a b, a parent graph G^ ar induces a regression graph for f a \ b and fb with 
the following three edge matrix components 

( Saalb Ta\b \ T 

{ ■ 5 ^J =In 

The result combines the one in (2.4) in slightly modified form with the above contin- 
uation of Example 1 by considering the consequences of a given parent graph for the 
distributions of Y a given Yf, and of Yb- 

For the more complex generating graphs connected with block-triangular linear systems 
(2.10) and given edge matrices H,W, the three edge matrix components in the induced 
regression graph of just two components are with 



^aa^ aa ^ab 

A bb Abb 



(2.14) 



^aa\b 



fC = zcr a H> Q = zer b W, 



V a \ b \ j 



"'aa 



A^ab + fcaaQabk-bb 
HbbQbbUbb 



(2.15) 
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From (2.15) for a = {a, S}, the edge matrices induced by Gp ar for f a \ b are 

and with a split of b as {/3,7}, the edge matrix induced for /^| 7 and for the dependence 
7 given V^| 7 are 

5^-° = [5 66 -°] w and 7^,^ = p>„|6]«,/J. 

In general, the induced graphs of (2.14) or (2.15) with dashed lines for S aa \b, arrows for 
V a \b and full lines for S bba will not be independence- preserving graphs. In both graphs, 
the global Markov property of Lemma 1 implies the meaning of a missing ifc-edge as 

iALk\b in 5 aa | 6 , iALk\b\k in V a \b, i ALk\b\ {i, k} in S bba . (2.16) 

Whenever every edge-inducing path is association-inducing, conditional dependences 
correspond to edges present in the graph in the resulting families of densities of Y a ^, 
Yb and also in a given member of the family unless associations cancel that are due to 
several edge-inducing paths. 



3. Summary graphs and associated models 
3.1. Gaussian summary graph models 

Starting from a Gaussian triangular system (2.2) generated over a parent graph in node 
set V, marginalising over M and conditioning on C gives a linear system of equations for 
Yn\c f° r N — ( u > v) = V\ {C, M} of the following form, where for the equations in the 
ancestors v of C that are outside of C, the equation parameter matrix and the covariancc 
matrix coincide with a concentration matrix, as in (2.8). 

Definition 14 (Gaussian summary graph model). A Gaussian summary graph 
model is a system of equations HYj^\c = Tj that is block-triangular and orthogonal in 
(u,v) with 

(Huu H uv \ ( Y n{c \ _ ( r) u \ (vu\_(W uu \ . . 

\ )\Y v \c)~\ C, ) ' U J V ' s ™- uM )' 1 ' 

where H uu is unit upper-triangular, W uu and ^X^h c = T, mKuM are symmetric and each of 
rj u and Cv have freely varying joint Gaussian distributions. The independence structure 
is given by a summary graph in node set N ; see Definition 6 and Section 3.2 below. 

For Y v \c, equation (3.1) specifies a Gaussian concentration graph model. These models 
had been studied under the name of covariance selection by Dempster (1972); see also 
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Speed and Kiiveri (1986). For each member of the this family of models, the likelihood 
function has a unique maximum. 

With W uv = 0, the residuals of Y u \c and Y v \c are uncorrelated, therefore the system of 
equation (3.1) is said to be orthogonal in (u,v). Because of this orthogonality, H u \ v .c = 
—H~^H UV is the population least-squares regression coefficient matrix in linear regression 
of Y u \c on Y v \c', see Example 1 above. In econometrics, the equation in Y u \ c resulting by 
premultiplication with from the first equation of (3.1) is called the reduced form. 

The equation in Y u \q of (3.1) can cquivalently be written as a recursive system in 
endogenous variables Y u \ vC = Y U \ C - IIu|«.c^|C7: 

H uu Y u \ vC = T) u with cov(rj„) = W uu , (3.2) 

where the equation parameter matrix H uu is, as in the linear triangular system (2.2), of 
unit upper-triangular form, but some of the residuals rj u are correlated. For estimation, 
one speaks in econometrics of the endogencity problem; see Drton, Eichler and Richard- 
son 2009 for a recent discussion. 

Identification is an issue for estimating the equation parameters H uu in (3.2). No 
necessary and sufficient condition is known yet; see Kang and Tian (2009). One general 
sufficient condition is the absence of any double edge in the summary graph; see Brito 
and Pearl 2002. This says that for any pair i, k within u, either Hik = 0, or Wik = 0, or 
both hold. 

However, some models with double edges in the G^ m correspond to identified instru- 
mental variable models; see the above example to Figure 5(b). For the identifiability of 
latent variable models, which arise here via larger hypothesized generating processes, the 
notion of completeness is again relevant; see San Martin and Mochart (2007). 



3.2. Generating G^ C ' M] from G v ar 

The summary graph G™m C ' M ' has four edge matrix components. With S v 
tration graph results in node set v, with H uu a directed acyclic graph within u, with W uu 
a covariance graph of the residuals r/ u and with % uv a bipartite graph for dependence of 
Y U \ C on Y V \ C . 

Starting from a Gaussian triangular system in (2.2) with parent graph G^ ar , the choice 
of any conditioning set C leads to an ordered split V = (O, R), where we think of R = 
{C,F} as the nodes to the right of O; see equation (3.3). Every node in F is an ancestor 
of a node in C outside C, so that we call F the set of foster nodes of C. No node in O has 
a descendant in R so that O is said to contain the outsiders of R. Equations, orthogonal 
and block-triangular in (0,i?), are in unchanged order 



Aoo A OR \ ( Y o\^(e 
A RR \Y R \e R 



(3.3) 



After conditioning on Yq and marginalising over Ym, the resulting system preserves 
block-triangularity and orthogonality with u CO, t'CP. 
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Proposition 4 (Linear equations obtained from AY = e after conditioning on 
Yc and marginalising over Ym)- Given a Gaussian triangular system (2.2) generated 
over G v aT , conditioning set C, marginalising set M= (p,q) with 

p = 0\ u, q~F\v, 

and partially inverted parameter matrices arranged in the appropriate order, 



D = invp A, 



■mv q t FFO =t ^ vC 



n 



q\v.C 



the induced linear equation (3.1) in Y N \ C have equation parameters 

tt pi tt r~) n n ^vv.uM 

and covariance matrices 

W uu = (A„„ + D up A pp Dl p ) + (D uq B qqlvC Dl q ), E™- uM . 



(3.4) 
(3.5) 



Proof. Equation (3.3) in Y are first modified into equations for Yq\c and Yp\c- As f° r 
Example 3 above, one takes = ArrA^^er. After noting that 



y-l 

^FF\C 



RR.Oi ^FF.O 
\F.F — 2j 



and by the orthogonality in (0,R), these equations can be written as 



AqoYqic + AofY f \ c — so, 



^FF.O 



Yi 



F\C 







Partial inversion on M = (p, q) gives, after appropriate ordering, 



mv M 



Aoo Aof 
t FFO 



( £p \ 


/ Y P\C\ 


Yu\c\_ 




Cq ~ 




\Y V]C / 


V a i 



(3.6) 



where, after deleting the equations in Y m \q, the uncorrelated residuals are 

Vu = ( £ u ~ D up £ p ) — D uq Yj qq \ v c(,q, Cv = Cv + .cCq- 

Thus, the equation parameter matrices of (3.4) and the covariance matrices of (3.5) 
result, where E" 1 ^ = E™-«° = E vv - uM . □ 

It is instructive to check the relations of the parameter matrices in (3.4) and (3.5) to 
regression coefficients and to conditional covariance matrices. With H u \r = —D~^(D UV , 
D uq ,D uC ), one may write 



-A,,„n 



^ D uv + D uq Tl 



q L1 q\v.C l 



Duu(Yu\C ~ ^-u\v.C Yv\c) — DuuY u i v c, 
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and for W uu denned in (3.2) and specialized in (3.5) 

DuuWuuD U u = ^uu\vqC + ^-u\q.vG^qq\vC^-u\q.vC = ^uu\vC> 

so that the required covariance matrix of Y u \ v c is obtained. 

The summary graph in node set N, induced by the generating parent graph in node 
set V, results now directly with Lemma 5 applied to equations (3.4) and (3.5), as is 
stated in Corollary 4. 

Corollary 4 (Generating the edge matrix of G^J-p'^ from the edge matrix of 
a parent graph) . With the partially closed edge matrices corresponding to Proposition 4 
and arranged in the appropriate order 

the induced edge matrix components of the summary graph GYv^F' M ^ are 

T~L UU ='D UU , H U v — In[C U D + Duq'Pq\v.c\i gvv.uM ^ (3.7) 

W uu = In[(Z UM + V up Vl p ) + {V uq S qq]vC Vl q )]. (3.8) 

3.3. Non-Gaussian models associated with summary graphs 

As noted before, the density /n\c of Yp? given Yc is well dchncd since it is obtained from 
a density of Yy generated over a parent graph by marginalising over Ym and conditioning 
on Yc- As we have seen, this leads to the factorization of /n\c m t° fu\vC an d fv\C- The 
independence structure of Y v given Yc is captured by a concentration graph. 

Corresponding models for discrete and continuous random variables have been studied 
by Lauritzen and Wermuth (1989), extending the Gaussian covariance selection models 
and the graphical, log-linear interaction models for discrete variables. Maximum like- 
lihood estimation is considerably simplified for variation-independent parameters; see 
Frydenberg and Lauritzen (1989). 

For a joint Gaussian density fy, the induced density f u \vC is again Gaussian, but 
in general, the form and parametrization of the density f u \ v c induced by fy may be 

complex. Nevertheless, we conjecture that the parameters associated with G^F'^ may 
often be obtained via the notional stepwise generating process described in Section 1.3, 
that is, by introducing latent variables that are mutually independent and independent 
o£Y v ,Y c . 

If the additional latent variables are taken to be discrete and to have a large number of 
levels, then it should be possible to generate, or at least to approximate closely enough, 

any association corresponding to i k that does not depend systematically on third 

variables. For discrete variables, this follows with Theorem 1 of Holland and Rosenbaum 
(1989) and otherwise presumably by using Proposition 5.8 of Studcny (2005), but a proof 
is pending. 



872 



N. Wermuth 



3.4. Generating a summary graph from a larger summary graph 

Let a summary graph in node set N' be given, where the corresponding model, actu- 
ally or only notionally, arises from a parent graph model by conditioning on Y c and by 
marginalising over variables Y m . 

Then, the starting linear parent graph model is the triangular system of equation (2.2) 
in a mean-centered Gaussian variable Y where 

AY = e, cov(e) = A diagonal, A unit upper-triangular. 

With Proposition 4, one obtains for V \ {c,m} = {n,v) the following equations in Y^| c , 
Y v \ c . which coincide in form with equation (3.1) with H' , = B yN i 



B m B, u \ (% c \ fr,'\ f V '\ (W' m 

)\Y v \c) \CJ ' V W V • 



(3.9) 



With added conditioning on a set c„ C v, no additional ancestors of c„ are defined, 
since every node in v is already an ancestor of c. But, with added conditioning on 
c ^ £ Mi the set fi \ c M is split into foster nodes f^ of and into outsiders o of {r, v}, 
where r = {c M ,/ Al }. 

The equations for Y^ are always block-triangular in (o, r). But, by contrast to the split 
of V into (0,R) in equation (3.3), these equations are not orthogonal in (o,r) so that 
conditioning on in the summary graph is more complex than conditioning directly on 
a set in the parent graph. 

Proposition 5 (Linear equations obtained from (3.9) after conditioning on 
Y c , Y Cv and marginalising over Y^, Yj). Given (3.9) to GYum° , where o contains 
all outsiders of {c^, Z^,^} , equations for Y^ are block-triangular in 

/i=(o,r), where r = {c^,f^}. 

The additional conditioning set {c^,c u }, and additional marginalising sets h C o and 
I C {fti,v\c v \ give C — {c,c^,c„} and M = {m,h,l}. With tp — (r,v), the new equations 
are block-triangular and orthogonal in (u,v), where 

u = o\h, <j) = ip\{cn,c v }, v = <f>\l. 

With orthogonalised residuals £ D = rf — QorVrj orders \i = (h,u,r) , 4> = (l,v) and 



Qiiii = inv r W C a% p = B oq p — Q or B r ^ 11 K = inv/j; 



5 o Co</ 

s^- c 



the linear summary graph model to G^um'*^ is 



^vv.uM 



Yu\c\ = ( Vu 
Y v \cJ \Cv 



Vu = S,u ~ KuhCh ~ Kul^U\vc(h (3.10) 
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and coincides with the linear model obtained from the triangular system (2.2) by directly 
conditioning on Yq and marginalising over Ym ■ 

Proof. The conditioning set splits the set of nodes \i into (o, r), where o is without 
any descendant in r = {c M , / M } and every node in f^ has a descendant in c. This implies 
a block-triangular form of in (o, r) in the equations of Y^|„ c , however, with correlated 
residuals 77^ and r( r . 

For -0 = (r, v), block-orthogonality with respect to (o,ip) in the equations in Y o \ and 
Ym c is achieved by subtracting from rj' the value predicted by linear least-squares re- 
gression of rj' on rj r and ( v . This reduces, because of the orthogonality of the equations 
in (fx, 1/), to subtracting QorVr from 77^. 

The matrix of equation parameters of Y^\ c coincides with the concentration matrix of 
Y0| c given by 

•relink. om v>-i f Bj r Q rr B rr Bj r Q rr B rL , \ . . 

By the block-triangularity and orthogonality in (o,ip), the equations in Y" | c can be 
replaced by equations in Y \q. For the equations in Ymgj the matrix of equation param- 
eters is = [S.0.J,| c ]0,0 = Y^"^- om . The resulting equations give the Gaussian linear 
model to the summary graph in node set V \ {C, m} = (o, (f>). 

In the linear model to G v ^ C ' m \ marginalising over Y h \ c , where hCo, and over Y^q, 
where I C 0, is achieved by partial inversion on h, I of the block-triangular matrix of 
equation parameters and by keeping only the equations in Y u \c and Y v \c- 

In the resulting equation (3.10), one knows by the commutativity and exchangeability 
of partial inversion for m= (g,k), p= {g, h}, q = {k, 1} that 

K uu = [invft inv g A] UfU = [mv p A] u>u , 

so that K uu — D uu . where D is defined for Proposition 4. Furthermore, by the properties 
of reduced form equations 

so that the parameter matrices of Y u \ c and Y v \q given in (3.10) coincide with those in 
(3.4) and (3.5) of Proposition 4 - that is, they give the Gaussian linear model to the 
summary graph in node set V \ {C, N} ~ (u, v). □ 

Since partial closure has the same exchangeability property as partial inversion and 
both operators are commutative, the same type of proof holds for the edge matrix ex- 
pression corresponding to (3.10). 

Corollary 5 (Generating the edge matrix of G^J-p'^ from the edge matrix of 
a summary graph). For c C C and m C M , edge matrix components of the summary 
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graph G V ^ C ' M ^ result from the edge matrix components B^, B^ v , W^ mu and S vv lim of 
G v\[c,m] by 

using the transformed edge matrices 

= zer r W'nn, C olf , = In[B olf , + Q.o r B ri \, K = zer w f 

to obtain JC UU , K, uv directly, S vv - uM as the edge matrix to (3.11), and 

W uu = ln[Q uu + K u hQhh^ah + fc>uiSu\vcfc>ui\- (3-12) 

3.5. Path results derived from edge matrix transformations 

If one starts with the summary graph GYum' m ^ and conditions by using Corollary 5, edges 
are induced by r-line collision paths, where we let r = {c^, /^} = {h}: 

(a) results with H • • • H O^, 

(b) O^, results with O^, — 151---0 E>M — 

(c) O^h — results with H • • • H [oH — O^. 

The corresponding relevant edge matrix expressions are, respectively, Q MM = zer r W^, 
lrx[B^Q rr B r ^,] and ln[Q or B r ^,]. For each pair, one keeps one edge of several of the same 
kind. The subgraph induced by nodes (o,<fi) is G v \<- G < m K 

By marginalising next over m! = (h,l) = {ft h , ft \) in the graph G^m '"^, three types 
of edges are induced when closing m'-line transmitting paths: 

(d) O O results with O ft { ... ft { O , 

(c) ^0 results with ^- ft h ■ ■ ■ ft h ^—0 , 

(f) O O ^O results with ^- ft % ■ ■ ■ ft h ft l ^ ••■ ft { O , 

(g) 0«- — O u results with O u -«— ft h --- ft h —>-O u , 

(h) O u — O u results with O u ^— ft \ ft [ — yO u . 

The corresponding edge matrix expressions are, respectively, Kqq, IC 00 , K, a( f,, 
In[fC u hQhhK^,h\ anc ^ ^ D [^uiSu\ v cf^ui\- After keeping just one edge of several of the same 
kind, the subgraph induced by nodes (u, v) is 

Notice that the effect of the indicator function is to reduce several edges of the same 
kind to just one. The closed form expressions of the edge matrix results imply that some 
of the paths are to be closed in the given order. 

The edge matrices In[Q or 6 r ^] and K, ol p correspond in a Gaussian summary graph model 
to orthogonalising, that is, to removing some residual correlations. By the associated 
steps, (c) or (f), ifc-arrows may be generated for which node k is not an ancestor of i in 
the generating graph. 

In contrast, for the outsiders of the conditioning set, such as set o in the summary 
graph in nodes (o,<fi), there is an ifc-arrow if and only if k is a parent or a forefather of 
node i in the larger generating parent graph because the only arrow-inducing paths for 
the subset o are those in (e). 
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Since a summary graph results after conditioning with steps (a)-(c) and also after 
marginalising with steps (d)-(h) , summary graphs are said to be closed under marginal- 
ising and conditioning and one may reverse the order of conditioning and marginalising. 
The following example illustrates such reversed stepwise constructions. 

Example 5 (Path constructions of G^^ ,M ^ for M — q and p — 0). The node 
set of the parent graph is V = (1, ...,8). The conditioning set is C = {2,4} and the 
marginalising set is M = {6,7}. The foster nodes of G, are in F = {3,5,6,7,8} and 
u = = {l}, v = {3,5,8}. 

In this example with graphs in Figure 7, the summary graph model is equivalent to 
a triangular system in N = (1,3,5,8) even though G^m ' M ' is not Markov equivalent to 

any directed acyclic graph since it contains the chordlcss collision path 3 — >~2 5h — 8. 

It is typical that further marginalising or conditioning may again lead to simpler graphs 
and models. 

With just one node in the marginalising set, the paths (d)-(h) have just two edges. In 
addition, by the properties of partial inversion and partial closure, the paths (a)-(c) can 
be closed by repeatedly closing paths of just two edges. This leads to operating on one 
node at a time in any order; see also the Appendix, Table 1 and Proposition 1. 



3.6. The MAG corresponding to G^ C ' M] and local Markov 
properties 

The keys to deriving the MAG corresponding to G^um are the definition of the vari- 
ables in the Gaussian MAG model and the result (2.15). For Y v , the summary graph 
and the MAG specify the same concentration graph, and dependences to arrows pointing 
from v to u also coincide. 

A full order of the nodes in u of G^^ ,M ^ may sometimes be given by the arrows, 
such as in Figure 3(b). Sometimes there is none, as in Figure 2(b). More often there is 
a partial order, such as in Figure 1(d) or 7(c). Then one may take any compatible full 
ordering of the nodes in u in which the ancestors within u of each node i in G^^P'*^ 
are in the past of i, that is, in {i + 1, . . . , d u }. 




Figure 7. (a) The generating graph G^ ar , (b) G^™, (c) GLT M] , (d) 
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For each node i, we let Cj C {i + 1, . . . , d„} denote the ancestors of i in and 
Cj = {i + 1, . . . , d u } \ Cj. Next, we derive for each node pair i, k with k in c» and each node 
pair i, I with / in Ci, the edges in the MAG corresponding to G^m C ' M ' by applying (2.15) 
to equation (3.2). 

For a = (1, . . . ,i, Cj) and 6 = Cj, the vector Vi\b = In[/Qb + QiblCbb] gives zeros and ones 
for the dependence of Yi on Y Ci given 1^, , Yc and 

in the MAG, i~i — k for In['Pj|fe.(Afc] = 1, i, k uncoupled, otherwise. (3.13) 

Similarly, for i, I we let en = CiUci and en = {i+ 1, . . . , d M } \ ey, take a = (1, . . . ,i, I, en) 
and b = eu. With 5 aa | h = In[/C aa Q aa /C aa ] of (2.15), /C,; = and W uu the edge matrix of 

the covariance graph of GYum^'^ ■ 

in the MAG, i 1 for In[W«.&] = 1, i,l uncoupled, otherwise. (3-14) 

The corresponding MAG results after inserting or replacing edges in G^um according 
to (3.13) and (3.14) and keeping just one of several same edges. 

Proposition 6 (Local Markov properties of summary graphs). Let the edge ma- 
trix components, H u n , W uu and S vv ' uM of GYuA C ' M ^ be given from Corollary 5. Let 
node I and sets Ci,eu be defined as above, but their subscripts dropped. Let further (3 
denote subsets of nodes uncoupled to node i, then: 

(1) i il_ (3\Cv \ {i, (3} <=> S i0 - uM = foriev and (3Cv. 

(2) i _U_ (3\Cv \/3 Hi .c = for ieu and fi Cv. 

(3) i AL l\Cve ^ (W u = and W ie W-W e i = 0) for i E u, and I E c. 

(4) % JL (3\Cvc \(3<=> {Hip = and W ic W- c n c p = 0) for i Eu and f3Cc. 

Notice that pairwise independences result if /3's contain single elements. 

Proof of Proposition 6. The independences in (1) within v are those of a concentra- 
tion graph; see also (2.16) in Example 4. The independences in (2) are those obtained 
when regressing Y^ c on Y v \ c ; see also Example 2. The independences in (3) and (4) are 
reformulations of (3.14) and (3.13), respectively. □ 



4. Discussion 



The common attractive feature of a maximal ancestral graph and of the corresponding 
summary graph is that they elucidate consequences of a possibly much larger generating 
graph regarding independences. The smaller graphs capture the independence structure 
implied by the generating graph and they can be used to understand additional conse- 
quences of the generating graph for independences that result after additional marginal- 
ising and conditioning. 
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An advantage of the MAG is that each edge corresponds to a conditional association, 
each missing edge to a conditional independence. A disadvantage of a MAG is that 
a dependence, say to i~i — k, may be severely distorted compared to the dependence 
to i~i — k in the generating process. With the corresponding summary graph, one can 
identify which of the conditional dependences in the MAG remain undistorted and which 
do not. 

Given the summary graph, the corresponding MAG is derived in a few steps. But in 
general, one cannot obtain from a given MAG the corresponding summary graph or the 
information about distortions. Both types of graph may contain semi-directed cycles. 
These are typically of interest only in connection with a larger generating process. 

In contrast, their common subclass of regression graphs gives a substantial and much 
needed enlargement of the types of research hypotheses that can be formulated with 
directed acyclic graphs. They model stepwise generating processes not only in univariate 
but also in joint responses. This leads to a corresponding recursive factorization of the 
joint density in these vector variables. 

In addition, every independence constraint for a component of a joint response is 
conditional on variables in the past of the joint response. This is an important distinction 
from all other types of currently known chain graphs and is in line with research in many 
substantive fields where the study of dependences on past variables is judged to be more 
fruitful than those of associations and of independences among variables arising at the 
same time. 

For Gaussian regression graph models, properties of estimators and test statistics have 
been quite well understood for a considerable time. For discrete random variables, all 
regression graph models are smooth; see Drton (2009). Such smooth models are curved 
exponential families (see Cox (2006), Section 6.8) so that they have desirable properties 
regarding estimation and asymptotic properties of tests. 

Much less is known for joint responses of discrete and continuous random components. 
Thus, though we now can derive important consequences of any type of regression graph 
model, more results on equivalence, identification, estimation and goodncss-of-fit criteria 
are needed. 

However, if the regression graph model can be generated, as discussed, via special 
types of hidden variables in a larger parent graph model, then its independence structure 
is defined by a list of independence statements for variable pairs. This permits local 
fitting with univariate generalized linear models, with checks for linearity, interaction and 
conditional independence based on observed associations of variable pairs and triples. 

This requires no knowledge about the form of the joint distribution and it permits us 
to formulate research hypotheses that are compatible with a given set of data and that 
arc to be investigated in further empirical studies. 

Appendix: Two-edge paths of summary graphs 

The following arguments show that the types of induced edges of Table 1 are self- 
consistent: A node to be marginalised over is again denoted by ft and a node to be 
conditioned on by M- 
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Figure 8. Active alternating paths that generate two-edge paths (a) of type (4) inducing 
O O, (b) of type (5) inducing O O. (c) of type (6) or (7) inducing — O. 

The three types of edge- inducing, two-edge paths (l)-(3) in a parent graph that have 
as an inner node a transition, a source or a sink node, respectively, are defined to generate 
the following three different types of edges: 

(1) — ft^ — o=^oh — o, 

(2) OH — ft — ^0=^0 O, 

(3) O — ^|oH — 0=^0 O. 

The arrow has one, the dashed line two and the full line no edge endpoints that define 
a collision node when the edge is mirrored at the same node. Dashed lines denote edges 
in covariance graphs and full lines in concentration graphs. Closing paths in such graphs 
are defined to preserve the type of edge: 

(4) O H O O O, 

(5) O ft 0=^0 O. 

The next two paths, (6) and (7), and both induce an arrow: 

(6) O [oH — 0=>0-< — O, 

(7) oh — ft — o=^o^ — o. 

Paths (4)-(7) arise from active alternating paths in a parent graph for which inner source 
nodes in {ft} alternate with inner sink nodes in {ES}: 

The two-edge paths (4)-(7) result from Figure 8 as follows: path (4) from (a) by 
only marginalising, path (5) from (b) by only conditioning, path (6) from (c) by only 
marginalising and path (7) from (c) by only conditioning. The paths (a)-(c) of Figure 8 
generalize paths (2), (3) and (1), respectively. 

The three remaining edge- inducing paths of two edges in are 

(8) CM — ft O O o, 

(9) o — ft ^ — o=^>o o, 

(10) O ft 0=^OH O. 

The three active paths of Figure 9 result by substituting the undirected edges in (8)-(10) 
by the appropriate generating components (2) or (3). 

By marginalising over the transition node in Figure 9(a)-(c), one generates, respec- 
tively, path (2), path (3) and the path in Figure 8(c). 

The construction of the summary graph simplifies considerably for special types of 
parent graphs - for instance, for the graphs to the lattice conditional independence 
models studied by Andcrsson et al. (1997), and for the graphs corresponding to labeled 
trees, studied by Castelo and Siebes (2003). 
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a) 





Figure 9. Active paths that generate two-edge paths (a) of type (8) inducing O O, (b) of 

type (9) inducing O O, and (c) of type (10) inducing 0-< — O. 
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